Goto

Collaborating Authors

 batch processing


realSEUDO for real-time calcium imaging analysis

Neural Information Processing Systems

Closed-loop neuroscience experimentation, where recorded neural activity is used to modify the experiment on-the-fly, is critical for deducing causal connections and optimizing experimental time. Thus while new optical methods permit on-line recording (via Multi-photon calcium imaging) and stimulation (via holographic stimulation) of large neural populations, a critical barrier in creating closed-loop experiments that can target and modulate single neurons is the real-time inference of neural activity from streaming recordings. In particular, while multi-photon calcium imaging (CI) is crucial in monitoring neural populations, extracting a single neuron's activity from the fluorescence videos often requires batch processing of the video data. Without batch processing, dimmer neurons and events are harder to identify and unrecognized neurons can create false positives when computing the activity of known neurons. We solve these issues by adapting a recently proposed robust time-trace estimator---Sparse Emulation of Unused Dictionary Objects (SEUDO) algorithm---as a basis for a new on-line processing algorithm that simultaneously identifies neurons in the fluorescence video and infers their time traces in a way that is robust to as-yet unidentified neurons. To achieve real-time SEUDO (realSEUDO), we introduce a combination of new algorithmic improvements, a fast C-based implementation, and a new cell finding loop to enable realSEUDO to identify new cells on-the-fly with no warm-up period. We demonstrate comparable performance to offline algorithms (e.g., CNMF), and improved performance over the current on-line approach (OnACID) at speeds of 120 Hz on average. This speed is faster than the typical 30 Hz framerate, leaving critical computation time for the computation of feedback in a closed-loop setting.


LLM as Effective Streaming Processor: Bridging Streaming-Batch Mismatches with Group Position Encoding

Tong, Junlong, Fu, Jinlan, Lin, Zixuan, Fan, Yingqi, Zhao, Anhao, Su, Hui, Shen, Xiaoyu

arXiv.org Artificial Intelligence

Large Language Models (LLMs) are primarily designed for batch processing. Existing methods for adapting LLMs to streaming rely either on expensive re-encoding or specialized architectures with limited scalability. This work identifies three key mismatches in adapting batch-oriented LLMs to streaming: (1) input-attention, (2) output-attention, and (3) position-ID mismatches. While it is commonly assumed that the latter two mismatches require frequent re-encoding, our analysis reveals that only the input-attention mismatch significantly impacts performance, indicating re-encoding outputs is largely unnecessary. To better understand this discrepancy with the common assumption, we provide the first comprehensive analysis of the impact of position encoding on LLMs in streaming, showing that preserving relative positions within source and target contexts is more critical than maintaining absolute order. Motivated by the above analysis, we introduce a group position encoding paradigm built on batch architectures to enhance consistency between streaming and batch modes. Extensive experiments on cross-lingual and cross-modal tasks demonstrate that our method outperforms existing approaches. Our method requires no architectural modifications, exhibits strong generalization in both streaming and batch modes. The code is available at repository https://github.com/EIT-NLP/StreamingLLM.


realSEUDO for real-time calcium imaging analysis

Neural Information Processing Systems

Closed-loop neuroscience experimentation, where recorded neural activity is used to modify the experiment on-the-fly, is critical for deducing causal connections and optimizing experimental time. Thus while new optical methods permit on-line recording (via Multi-photon calcium imaging) and stimulation (via holographic stimulation) of large neural populations, a critical barrier in creating closed-loop experiments that can target and modulate single neurons is the real-time inference of neural activity from streaming recordings. In particular, while multi-photon calcium imaging (CI) is crucial in monitoring neural populations, extracting a single neuron's activity from the fluorescence videos often requires batch processing of the video data. Without batch processing, dimmer neurons and events are harder to identify and unrecognized neurons can create false positives when computing the activity of known neurons. We solve these issues by adapting a recently proposed robust time-trace estimator---Sparse Emulation of Unused Dictionary Objects (SEUDO) algorithm---as a basis for a new on-line processing algorithm that simultaneously identifies neurons in the fluorescence video and infers their time traces in a way that is robust to as-yet unidentified neurons.


A Diagnosis and Treatment of Liver Diseases: Integrating Batch Processing, Rule-Based Event Detection and Explainable Artificial Intelligence

Chandra, Ritesh, Tiwari, Sadhana, Rastogi, Satyam, Agarwal, Sonali

arXiv.org Artificial Intelligence

Liver diseases pose a significant global health burden, impacting many individuals and having substantial economic and social consequences. Rising liver problems are considered a fatal disease in many countries, such as Egypt and Moldova. This study aims to develop a diagnosis and treatment model for liver disease using Basic Formal Ontology (BFO), Patient Clinical Data (PCD) ontology, and detection rules derived from a decision tree algorithm. For the development of the ontology, the National Viral Hepatitis Control Program (NVHCP) guidelines were used, which made the ontology more accurate and reliable. The Apache Jena framework uses batch processing to detect events based on these rules. Based on the event detected, queries can be directly processed using SPARQL. We convert these Decision Tree (DT) and medical guidelines-based rules into Semantic Web Rule Language (SWRL) to operationalize the ontology. Using this SWRL in the ontology to predict different types of liver disease with the help of the Pellet and Drools inference engines in Protege Tools, a total of 615 records were taken from different liver diseases. After inferring the rules, the result can be generated for the patient according to the rules, and other patient-related details, along with different precautionary suggestions, can be obtained based on these results. These rules can make suggestions more accurate with the help of Explainable Artificial Intelligence (XAI) with open API-based suggestions. When the patient has prescribed a medical test, the model accommodates this result using optical character recognition (OCR), and the same process applies when the patient has prescribed a further medical suggestion according to the test report. These models combine to form a comprehensive Decision Support System (DSS) for the diagnosis of liver disease.


Probabilistic Quantum SVM Training on Ising Machine

He, Haoqi, Xiao, Yan

arXiv.org Artificial Intelligence

Quantum computing holds significant potential to accelerate machine learning algorithms, especially in solving optimization problems like those encountered in Support Vector Machine (SVM) training. However, current QUBO-based Quantum SVM (QSVM) methods rely solely on binary optimal solutions, limiting their ability to identify fuzzy boundaries in data. Additionally, the limited qubit count in contemporary quantum devices constrains training on larger datasets. In this paper, we propose a probabilistic quantum SVM training framework suitable for Coherent Ising Machines (CIMs). By formulating the SVM training problem as a QUBO model, we leverage CIMs' energy minimization capabilities and introduce a Boltzmann distribution-based probabilistic approach to better approximate optimal SVM solutions, enhancing robustness. To address qubit limitations, we employ batch processing and multi-batch ensemble strategies, enabling small-scale quantum devices to train SVMs on larger datasets and support multi-class classification tasks via a one-vs-one approach. Our method is validated through simulations and real-machine experiments on binary and multi-class datasets. On the banknote binary classification dataset, our CIM-based QSVM, utilizing an energy-based probabilistic approach, achieved up to 20% higher accuracy compared to the original QSVM, while training up to $10^4$ times faster than simulated annealing methods. Compared with classical SVM, our approach either matched or reduced training time. On the IRIS three-class dataset, our improved QSVM outperformed existing QSVM models in all key metrics. As quantum technology advances, increased qubit counts are expected to further enhance QSVM performance relative to classical SVM.


Position-Aware Depth Decay Decoding ($D^3$): Boosting Large Language Model Inference Efficiency

Fan, Siqi, Fang, Xuezhi, Xing, Xingrun, Han, Peng, Shang, Shuo, Wang, Yequan

arXiv.org Artificial Intelligence

Due to the large number of parameters, the inference phase of Large Language Models (LLMs) is resource-intensive. Unlike traditional model compression, which needs retraining, recent dynamic computation methods show that not all components are required for inference, enabling a training-free pipeline. In this paper, we focus on the dynamic depth of LLM generation. A token-position aware layer skipping framework is proposed to save 1.5x times operations efficiently while maintaining performance. We first observed that tokens predicted later have lower perplexity and thus require less computation. Then, we propose a training-free algorithm called Position-Aware Depth Decay Decoding ($D^3$), which leverages a power-law decay function, $\left\lfloor L \times (\alpha^i) \right\rfloor$, to determine the number of layers to retain when generating token $T_i$. Remarkably, without any retraining, the $D^3$ achieves success across a wide range of generation tasks for the first time. Experiments on large language models (\ie the Llama) with $7 \sim 70$ billion parameters show that $D^3$ can achieve an average 1.5x speedup compared with the full-inference pipeline while maintaining comparable performance with nearly no performance drop ($<1\%$) on the GSM8K and BBH benchmarks.


Scalable and Cost-Efficient ML Inference: Parallel Batch Processing with Serverless Functions

Barrak, Amine, Ksontini, Emna

arXiv.org Artificial Intelligence

As data-intensive applications grow, batch processing in limited-resource environments faces scalability and resource management challenges. Serverless computing offers a flexible alternative, enabling dynamic resource allocation and automatic scaling. This paper explores how serverless architectures can make large-scale ML inference tasks faster and cost-effective by decomposing monolithic processes into parallel functions. Through a case study on sentiment analysis using the DistilBERT model and the IMDb dataset, we demonstrate that serverless parallel processing can reduce execution time by over 95% compared to monolithic approaches, at the same cost.


Reviews: Stochastic Optimization with Variance Reduction for Infinite Datasets with Finite Sum Structure

Neural Information Processing Systems

The paper proposes a method for optimization problems often found in machine learning tasks. The general loss function to minimize is of the form of a sum of smooth-convex functions associated with a convex regularization potential. The method is designed for the case of perturbation introduced in the data. Since the data sampling introduces a stochastic component Stochastic Gradient Descent (SGD) need of modifications for reducing the gradient variance [14,28]. In the case of perturbed data, such variance is magnified.


Transforming Agriculture with Intelligent Data Management and Insights

Pan, Yu, Sun, Jianxin, Yu, Hongfeng, Bai, Geng, Ge, Yufeng, Luck, Joe, Awada, Tala

arXiv.org Artificial Intelligence

Modern agriculture faces grand challenges to meet increased demands for food, fuel, feed, and fiber with population growth under the constraints of climate change and dwindling natural resources. Data innovation is urgently required to secure and improve the productivity, sustainability, and resilience of our agroecosystems. As various sensors and Internet of Things (IoT) instrumentation become more available, affordable, reliable, and stable, it has become possible to conduct data collection, integration, and analysis at multiple temporal and spatial scales, in real-time, and with high resolutions. At the same time, the sheer amount of data poses a great challenge to data storage and analysis, and the \textit{de facto} data management and analysis practices adopted by scientists have become increasingly inefficient. Additionally, the data generated from different disciplines, such as genomics, phenomics, environment, agronomy, and socioeconomic, can be highly heterogeneous. That is, datasets across disciplines often do not share the same ontology, modality, or format. All of the above make it necessary to design a new data management infrastructure that implements the principles of Findable, Accessible, Interoperable, and Reusable (FAIR). In this paper, we propose Agriculture Data Management and Analytics (ADMA), which satisfies the FAIR principles. Our new data management infrastructure is intelligent by supporting semantic data management across disciplines, interactive by providing various data management/analysis portals such as web GUI, command line, and API, scalable by utilizing the power of high-performance computing (HPC), extensible by allowing users to load their own data analysis tools, trackable by keeping track of different operations on each file, and open by using a rich set of mature open source technologies.


Managing GPU Costs for Production AI

#artificialintelligence

As teams integrate ML/AI models into production systems running at-scale, they're increasingly encountering a new obstacle: high GPU costs from running models in production at-scale. While GPUs are used in both model training and production inference, it's tough to yield savings or efficiencies during the training process. Training is costly because it's a time-intensive process, but fortunately, it's likely not happening every day. This blog focuses on optimizations you can make to generate cost savings while using GPUs for running inferences in production. The first part provides some general recommendations for how to more efficiently use GPUs, while the second walks through steps you can take to optimize GPU usage with commonly used architectures.